On K-word Proximity Search
نویسنده
چکیده
When we search from a huge amount of documents, we often specify several keywords to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As the result, the search result contains some meaningless documents. In this paper, we propose two algorithms for nding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We implement above algorithms and verify their eectiveness.
منابع مشابه
The Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)
The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...
متن کاملFast Algorithms for k-word Proximity Search
When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keyword...
متن کاملText Retrieval by Using k-word Proximity Search
When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As the result, the search result contains some meaningless documents. It is therefore e ective to rank documents according to proximity of keywor...
متن کاملA Space-Efficient Frameworks for Top-k String Retrieval
The inverted index is the backbone of modern web search engines. For each word in a collection of web documents, the index records the list of documents where this word occurs. Given a set of query words, the job of a search engine is to output a ranked list of the most relevant documents containing the query. However, if the query consists of an arbitrary string — which can be a partial word, ...
متن کاملAdjacency and Proximity Searching in the Science Citation Index and Google
We have developed simple algorithms that allow adjacency and proximity searching in Google and the Science Citation Index (SCI). The SCI algorithm exploits the fact that SCI stopwords in a search phrase function as a placeholder. Such a phrase serves effectively as a fixed adjacency condition determined by the number n of adjacent stopwords (i.e., retrieve all records where word A and word B ar...
متن کامل